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[57] ABSTRACT 

The computer document audio access and conversion sys- 
tem allows a user to access information originally formatted 
for audio/visual interfacing on a computer network via a 
simple telephone. Of course, files formatted specifically for 
audio interfacing can also be accessed by the system. A user 
can call a designated telephone number and request a file via 
dual-tone multi-frequency (DTMF) signalling or through 
voice commands. The system analyzes the request and 
accesses a predetermined document The document may be 
in a standard document file format, such as hyper-text 
mark-up language (HTML) which is used on the World 
Wide Web. The document is analyzed by the system, and 
depending on ihe different types of formats used in the 
document, information is translated from an audio/visual 
formal to an audio format and played to the user via the 
telephone interface. The document may contain links to 
other documents which can be invoked to access such other 
documents. In addition, the system can have a native com- 
mand capability which allows the system to act indepen- 
dently of the accessed document contents to replay a docu- 
ment or carry out functions similar to those available in 
conventional web browsers. 

38 Claims, 4 Drawing Sheets 
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COMPUTER NETWORK AUDIO ACCESS 
AND CONVERSION SYSTEM 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invcnlion relates to accessing information from a 
computer network via a telephone, PDA equipped with an 
audio input/output or other portable device, speaker phone, 
or other audio device. More specifically, this invention 
relates to dynamically converting standard document 
formats, such as hyper-text mark-up language (HTML), 
standardized graphic mark-up language (SGML), Hytime, 
and electronic mail (E-mail), for use in an audio interface, 
locally or over a telephony network. 

2. Discussion of the Related Technology 

Voice mail and other interactive voice response (IVR) 
systems allow a user to access audio information stored in a 
computer memory such as a hard disk. Typically, the audio 
information is stored in audio liles created either by the user 
or for the user. Conventional IVR systems use dual-tone 
multi-frequency (DTMF) signalling to allow the user to 
interact with the server through a standard telephone key- 
pad. Pre-recorded audio information is available on IVR 
systems in the form of instructional phrases such as "Please 
type in your account number followed by the pound sign." 

Pre-recorded audio is also used for introductory phrases 
such as "Your account balance is ... "At this point, the IVR 
computer may access a connected database that stores the 
requested account balance in numerical format, convert the 
numerical format to an audio format using a numerical 
text-to-speech engine, and state the account balance. This 
conversion from numerical format to audio format is 
extremely rigid and completely predefined. IVR systems are 
"closed" in that each IVR system is uniquely designed, not 
connected to a computer network, and IVR systems cannot 
be used interchangeably. Also, these IVR systems are 
designed specifically for audio interaction. 

In contrast, audio/visual information on a audio/visual 
server in a computer network may be accessed using a 
personal computer. For example, a World Wide Web (Web) 
page on the Internet may be accessed using a computer 
linked through an Internet access provider, such as America 
On Line 11- or Prodigy™, to a Web server. In certain 
situations, however, use of a computer may not be feasible 
or access to a computer may not be possible. For example, 
a cellular telephone user driving an automobile may want to 
know about traffic in the surrounding area, however, the user 
cannot operate a computer while in the car. In situations such 
as this, an audio interface may be useful for obtaining 
information from the Internet or another computer network. 

Other situations where an audio interface to a computer 
network may be useful include accessing an electronic 
calendar on a local area network (1.AN) to receive or modify 
an itinerary, accessing E-mail on the Internet or a wide-area 
network (WAN) while away from a computer, and request- 
ing a telephone number from an electronic yellow pages or 
white pages while at a pay phone. An audio interface to the 
Web could also be used to traverse the Internet and obtain 
information residing on various Web servers. 

Thus, there is a need for flexible access to various types 
of computer networks via an audio interface. There is a need 
for interactive telephone access to a computer network. 
There is also a need for dynamic conversion of an audio/ 
visual tile format to a pure audio format. 

SUMMARY OF THE INVENTION 

Tlii; computer network audio access and conversion sys- 
tem allows a user to access information originally formatted 
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for audiovisual interfacing on a computer network via a 
simple telephone. Of course, information formatted specifi- 
cally for audio interfacing, such as information from voice 
mail and other IVR systems, can also be accessed by the 

5 system. A user can call a designated telephone number and 
request information via DTMF signalling or through voice 
commands. The system analyzes the user's request, estab- 
lishes a connection with a target computer network, and 
finds and retrieves the requested information in a standard 

10 document tile format, such as HTML which is used on the 
World Wide Web. The document file is analyzed by the 
system, and depending on the different types of structures 
used in the file, information is translated from an audio/ 
visual format to an audio format and played to the user via 

15 the telephone interface. Typically, the system will use a 
text-to-speech engine to convert the document to audio 
information. 

For example, if a Web page is returned from the Internet, 
the title of the Web page may be read in a low male voice. 

20 Headline information (or text formatted above a certain 
typesize) may be read in a female voice. General text 
information (or text formatted below a certain typesize) may 
be read in a different voice. A hyper-text link may be read in 
a contrasting voice, or a bell sound may be used to indicate 

25 a hyper-text link. Hyper-text lists may be read to the user in 
a menu format with an opportunity for the user to select a list 
entry following the speaking of each entry by the system. 
This may be accomplished by passing the document through 
a parser to interpret its contents. The document may then be 

30 passed through a text- to -speech engine to read the text. The 
engine may be responsive to the parser in order to select the 
voice that is used. In addition, the parser will select what 
portions of the document are converted to speech. 

Throughout the speaking of the Web page, a user may 

35 interact with the system through DTMF signalling or voice 
control. For example, the user may press 1 to indicate the 
selection of a hyper-text link during a one second period 
after a hyper-tcxt link is indicated. Or the user may speak a 
list entry after the speaking of a hyper-iexi list to select a 

40 hyper-text link. 

The user interaction is the mechanism by which a user 
navigates between and within the documents. The system 
may present navigation options in order to assist the user in 

45 the form of a menu or simply by using a recognized voice 
or other audio signal to designate navigational options. The 
system may permit navigation based on the content of a 
document or other criteria. 
The user command may be DTMF signals or other 

50 recognized signaling methods, or by voice response. The 
voice response system may be a voice recognition system 
where the voice recognition will attempt to match a speech 
input to a preselected list of potential selections or choices. 
The preselected list can be thought of as the dictionary of 

55 words that the voice recognition system will recognize. The 
voice response system may alternatively be a speech-to-text 
system which will simply convert a user command to text 
which will then be used by the system to control navigation. 
According to an advantageous feature, the system will 

60 respond to a user input to navigate the document or docu- 
ments based on content of the document. All documents may 
contain content which is useful for navigation. For example, 
an HTML document may contain tags designating links or 
portions of a document. The system will attempt to navigate 

65 to a location corresponding to a user command and effect 
any action possible al 1 1 tat location. For example, if I he 
command corresponds to a link, then the system will take the 
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linked action (such as play an audio file or jump to another 
document). If the command corresponds to textual content, 
the system will skip to the next occurrence corresponding to 
the command in the document. For example, in a document 
containing a list of stock symbols and quotes, if a user inputs 
a command corresponding to a stock symbol, the system 
may skip ahead to the symbol and begin "reading" at the 
location of the stock symbol. 
Also, advanced intelligent network (AIN) features may be 



protocol/I nternel protocol (TCP/IP), resides above the oper- 
ating system. Various hardware hoards for features such as 
telephony 111 and DTMF detection 112, texl-to-specch 
conversion 113, speech -lo-text conversion 114, text file 
decompression 115, audio file decompression 116, and audio 
tile playing 117, also reside in this layer. Other boards 118 
may be added to provide additional capabilities. 

In one embodiment, a Dialogic™ D/41D board, manu- 
factured by Dialogic Corp. in Parsippany, N.J.. handles the 



incorporated into the system to allow access to individual W telephony, DTMF detection, and audio file playing features, 

user profiles using caller identification (ID) information, A DGCtaik™ speech engine, manufactured by Digital 

location profiles using location ID information, user Equipment Corp. in Maynard, Mass, handles the text-to- 

preferenccs, and sensitive networks using a combination of SP*** conversion. Dragon Dictate™ available from Dragon 

caller ID, password, and voice recognition information. Systems, Inc. or Direct Talk™ available from IBM may be 

Additionally, an AIN connection may be used to designate ' 5 used 10 handle speech-to-text conversion for voice command 

a home page for an individual user, define other preferences, and control. 

or enhance security by implementing encryption or commu- Above this layer, software applications such as standard 
nicating encryption keys. network software libraries 121 are provided in order to 
Accordingly, it is an object of the invention to provide an h ^ ne,work com™mcations One example of such a set 
audio information presentation system for accessing and 20 °/,A l ^ ncs ^J??"* u suite , d ^ r . lhe ,^ orld WidcWeb 
navigating through electronic documents and presenting £VWW) is pubhshed by the National Center for Super- 
information contained in documents which are not con- Computing Applications (NCSA). A typical library may 
strained by an audio compatible format. It is a feature of the inc,ude ,he fo » owin 8 modules: 

invention to permit access to electronic information, which open p 00 * 1 ™* nles 

25 store bookmarks 



delete bookmarks 
edit bookmarks 
save bookmarks 
handles mime types 



is not specifically formatted for audio retrieval, without the 
requirement of a traditional computer access device. The 
system is suited for accessing information contained on a 
computer network, including the internet, without a com- 
puter terminal. Information may be accessed over the tele- 
phone by people without a computer or individuals with 30 nand,e c & senpts 
special needs such as those who may have difficulty using a vanous character sets 
computer, i.e., visually impaired, mobility impaired, or 
individuals with other requirements that make using a com- 
puter difficult. 



BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 shows the software and hardware architecture of a 
preferred embodiment. 



handle interrupts from the computer and networks 
download URL's 

download files with various options 
35 follow changed URL's to new designations 

handles forms, indices, passwords, encryption schemes, etc 
handles different forms of user input 
connects to news services, internet news, USENET bulletin 
boards, etc. 



FIG. 2 shows the system architecture of a preferred 40 gopher connections 



embodiment. 

FIG. 3 shows an advanced intelligent network implemen- 
tation of a preferred embodiment. 



save previous URL's and cache/uncache the previously 

visited network designations 
toggle source/presentation for current document 
reload the current document 



MG. 4 shows an advanced intelligent network implcmen- . A , , . , . 

. , . . „ i.i i- . . i 45 pipe the current document to an external command 
tation that may be used to implement long distance tele- . . . 



phone access across a network, 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

FIG. 1 shows the software and hardware architecture of a 
preferred embodiment. A standard telephone 10, either ana- 
log (POTS) or digital (ISDN), may be connected to the 
architecture 100 using a standard POTS or ISDN telephone 



quit the browser 
quit the browser unconditionally 
view the next page of the document 
view the previous page of the document 
50 go back two lines in the document 
go forward two lines in the document 
refresh the screen to clear garbled text 
go to the beginning of the current document 
go to the end of the current document 



line. The architecture 100 is then connected through a 55 make the previous link current 



computer network 15, such as the Internet, to various servers 
18, 19, which may be Web servers running hyper-text 
transfer protocol (HTTP). Other networks, such as WANs 
and LANs, and other servers, such as FIT servers and LAN 
servers, may be connected to a telephone using the archi- 
tecture. 

The architecture 100 is shown in layers to denote an 
equivalency to the layered architecture model of Inlcrna- 



niakc the next link current 
move up the page to a previous link 
move down the page to another link 
move right to another link 
60 move left to a previous link 

display a list of previously viewed documents 
go back to the previous document 
go to the document given by the current link 
go to a document given as a URL 



tional Standards Organization Open Systems interconnec- 
tion (ISO OSI). Above the physical layer is an operating 65 display help on using the browser 
system 101 such as UNIX or a variation of UNIX. A display an index of potentially useful documents 
communication protocol 119, such as transmission control force resubmission of form if presently cached 
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interrupt network transmission the content of the document. Headings, labels, text, 

return lo the first screen (home page) graphics, audio information, comments, and other types of 

display and change option settings content will be identified for the call manager 2 10 to handle 

allow searching of an index appropriately. Labels, document names, and text may be 

search within the current document 5 through a text-to-speech converter. Headings and 

search lor the next occurrence S ra P hics ma y ** eim e r [ ^ orcd or a si 3 nal > such f a to . ne ' 

send a comment to the author of the current document ™y presented to signify such content Audio information 

edit the current document ma ? * played. Appteis may be stored for later execution. 

display information on the current document and link Lmks ™y l ? vok ^ * S1 * nal to ■ft"*' 'dent, ymg informa- 
y t 3 . . r ■ .- .l . j , , rt lion as a link and be processed by the text-to-speech con- 
display choices for pnoung .he current document 10 ^ Mer ^ enlire F docuroenl i/ pr0CCSS cd or "read," the 
add to your personal bookmark list m , the usef wjth predetermined options 
delete from your personal bookmark list such as docum en t> previous document, terminate/ 
view your personal bookmark list hang up> or heIp 

escape from the browser to the system i1lc ^ comma nds or signals from the user's telephone 
download the current link to your computer 15 10 are captured by the Call Manager 210 and sent to a 
toggle tracing of browser operations translator 220 that translates the user's commands from 
show other commands in the novice help menu DTMF signals to a subject word or phrase, such as "Wash- 
go directly to a target document or action ington D.C. area weather," "Silver Spring, Md. traffic," or 
display the current key map "Baltimore Orioles," using a DTMF detection board 112 
create a new file or directory 20 (shown in FIG. 1). Alternatively, voice command and con- 
remove a file or directory trol could be used to request information. For example, 
modify the name or location of a file or directory instead of entering a number or alphanumeric sequence from 
tag a tile or directory for later action a DTMF keypad to select a particular sports team, a user 
display a full menu of file operations may say, "Baltimore Orioles." This voice information would 
upload from your computer to the current directory 25 be translated 220 by a speech-to-text engine 114 (shown in 
. install file or tagged files into a system area FIG. 1) or interpreted by a voice recognition system, and the 
report version system would interpret the voice command as a request for 
toggle a checkbox the most recent Baltimore Orioles baseball score. Of course, 

Network browser software 122 is also provided. An more advanced and complex voice command and control 
example of such a browser is available from Netscape 30 options may be used to gather information from the user. 
Communications, Inc. under the name Netscape Naviga- Generally, the system will attempt to interpret the user 
tor™. The system may also include searcher software 123 in command and then attempt to navigate based on the corn- 
order to assist in locating and indexing documents. Other mand. Once the numeric, alphanumeric, or voice command 
software 131 is provided for controlling the inputs and information from the telephone 10 is translated, the subject 
outputs of the various boards 111-118. Preferably, all of the 35 word or phrase is passed to the Call Manager 210. At this 
software and hardware architecture 100 resides on a single point, the user may choose lo invoke a search for related file 
machine, such as a DEC Alpha™ 1000 4/233 machine. addresses on the computer network. Otherwise, a predeter- 

F1G. 2 shows the system architecture of a preferred mined audio-compatible address is selected by the system, 

embodiment. The condition of a telephone 10 connected to The Call Manager 210 then routes this information lo the 

a telephone line coming into the system 200 is analyzed 40 Parser 230, which is a sophisticated software program. The 

according to a standard telephony interface such as the Parser may either match a predetermined file address, stored 

Dialogic™ D/41D board or an equivalent. For example, the in memory, to the subject word or phrase or send the subject 

telephony board 111 (shown in FIG. 1) detects whether the word or phrase to Searcher 240, which could be a computer 

telephone 10 is on hook, off hook, busy, ringing, or in program such as Lycos™ or Web Crawler™, to find 

another telephony slate. A user can initiate connection of a 45 addresses of files on a target computer network 15 relating 

telephone to the system by taking the telephone off hook and to the subject word or phrase. For example, if the target 

dialing a telephone number. When a telephone 10 is con- computer network is the Internet, the Searcher may find 

nected to the system 200, the Call Manager 210 software uniform resource locators (URLs) of Web pages relating to 

implemented on a computer directs the audio file player 270 the subject phrase "Washington D.C. area weather." A 

lo recite a voice prompt, such as "You have reached the so searcher may be outside of the system 200 (as shown) or part 

Audio Web Connection. Please press 1 for local weather of the system itself (not shown). 

information. Please press 2 for local traffic information. If a search is conducted and more than one address is 

Please press 3 for national sports information." This voice returned by the searcher, the file addresses from the searcher 

prompt may be stored as an audio file, a text file, a arc transformed into an audio menu so that the user may 

compressed audio file, or another type of file. Submenus 55 select a single address. A searcher returns an unordered list 

may also be provided to request information such as the in HTML, which is transformed into an audio menu by the 

geographic location from which the user is calling or the system. Preferably, the audio menu recites the total number 

sport or sports team in which the user is interested. of addresses found by the searcher. Then the audio menu 

Alternatively, the initial connection can be to access a may give instructions for the user to press a DTMF keypad 

system document or home page. The system home page may so number or say a number corresponding to a menu item, 

include introductory information and links to other docu- recite the numbers and their corresponding menu items, and 

mcnls such as weather, traffic, or sports. In addition, there process any received signals using DTMF detection board 

may be links to a master directory, a search engine such as 112 (shown in FIG. 1) or speech-io-lext conversion board 

Web Crawler™ or any other document, applet, or other 114 (shown in FIG. I). If no menu item has been selected, 

function or place permitted by the network protocol, if any. 65 the audio menu may present additional options lo the user, 

When a document is accessed, il will be processed ad van- such as recite the menu items again or conduct another 

lageously through a parser 230. The parser 230 will interpret search for addresses. 
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Preferably, the audio menu recites descriptive information address of Ihe file so thai the user can access an audio/visual 
returned from the searcher as part of the unordered list. The version of Ihe (He from the user's computer at a later time, 
user may select the amount of descriptive information Note that video images, graphics, and oiher non- 
recited. For example, the user may choose for the system to compatible content, although retrieved from the target server 
recite one sentence, or speak for two seconds, or continue to 5 and copied to the Parser, will not usually be passed out of the 
recite until a specified DTMF signal is sent. Parser, because the audio interface cannot handle video 

Once a single address is selected by the user, or if only one images. Images, however, may be routed to an alternate 

address is found by the Searcher, or if the file address is delivery site at the user's command. For example, a user 

predetermined, the Parser passes the address to Browser 250 may hear a caption being read and know that a picture of a 

which establishes a connection to the appropriate server 18 10 weather front is available on the requested file. The user may 

through the network 15. Once the connection is established, then request that this picture be sent to the user's facsimile 

the Browser 250 downloads the entire requested file and machine or computer (not shown) via DTMF signalling or 

passes the file to the Parser 230. The Parser dynamically voice commands. 

analyzes the structure and contents of the downloaded file. If, however, an image file has an HTMLor other standard 

For example, if the file is a Web page, the Parser may is format indication that it contains text, such as a facsimile 

determine the title of the Web page, find a table, mark file, the system may use other boards 118 (shown in f : IG. 1) 

occurrences of hypcr-text links, find ordered and unordered such as a character recognition board to convert the image 

lists, distinguish images, find captions, denote paragraphs file to a text file. Then, the system may read the text file to 

and numbers, locate various abbreviations, and detect com- a user using the text-to-speech conversion board 113 (shown 

pressed or uncompressed audio or audio/video files. If the 20 in FIG. 1). 

file is an E-mail message, the Parser may parse header FIG. 3 shows an Advanced Intelligent Network (A1N) 

information such as the "From:" field, the "To:" field, the implementation of a preferred embodiment. An AIN has 

"CC:" field, the time stamp, routing information, and for- been developed that overlays ISDN facilities and provides a 

warding information. The Parser may also locate the body of variety of service features to customers. Because an AIN is 

the message, appended text, graphics, or audio files, and 25 independent of ISDN switch capabilities, AIN services can 

other segments of standard E-mail formats such as Lotus easily be customized for individual users. U.S. Pat. Nos. 

Notes™ or X.400 or X.500 international standards. 5,418,844 and 5,436,957, the disclosure of which is incor- 

Other standard formats for computer files such as full text porated by reference herein, describe many features and 

databases, ASCII databases, word processing files, and services of the AIN. AIN may use intelligent peripherals 

scheduling and itinerary files, maybe analyzed by the Parser 30 (IPs) to implement the system. Bellcore protocol 1129+, or 

for audio conversion. The Parser may be modified to analyze another appropriate protocol, may be used to establish a 

any standard file format so that any requested file in that communication link between an IP and other machines in the 

standard format can be converted for an audio interface. AIN. An IP, such as a speech IP, could handle all speecb- 

For each file segment, the Parser 230 passes the structure to-text and text-to-speecb conversions within the system, 

type and the associated text or audio contents to the Call 35 Also, a server IP could handle all interactions of the system 

Manager 210, which routes it to the appropriate board to with a computer network. 

create an audio file to be played by audio file player 270. For A telephone 10 is connected to a central office 3 10, which 

example, a compressed text segment may be sent from handles telephony interfacing, that is connected to an intel- 

Parser 230 through Call Manager 210 to a text file decom- ligent signal control point (ISCP) 320 via SS7 signalling, 

pressor 261 and a text-to-speech convertor 260 for transla- 40 Residing in the ISCP 320 is Call Manager 210 and translator 

tion into an audio file. The audio file would then be routed 220, which handles the interactions between the computer 

through Call Manager 210 to audio file player 270. In network, the requested file, and the user. The Call Manager 

another example, a list would be sent to a board that would could reside elsewhere in the system, such as in a server IP. 

create an audio menu, and the menu would be transmitted In a World Wide Web embodiment, the ISCP 320 interacts 

through the Call Manager 210 to the audio file player 270 for 45 with Web Server IP 350 via standard U29+spccifica lions, 

speaking to the user. Or a compressed audio or audio/video which contains a parser 230, a searcher 240, and a browser 

segment could be sent to an audio file decompressor 262 to 250 as previously described. A Server IP would also contain 

create a decompressed audio file that would be sent to the a presentation manager 355, which would determine Ihe 

audio file player. Of course, uncompressed audio could be user's equipment and formal the presentation of information 

sent straight to the audio file player. Other methods of 50 from the network appropriately. For example, if a user's 

routing and transforming files and file segments into an equipment was a personal computer rather than a telephone, 

audio format may be used. For example, a text-to-spcech the presentation manager would modify the presentation 

convertor output could be bridged directly to a telephone format for an audio/visual interface rather than an audio 

line instead of creating an audio file for playing by the audio interface. 

file player. 55 For an E-mail embodiment, an E-mail server IP may be 

Throughout the speaking of audio files by the system, the connected to an ISCP. Similarly, for a LAN or WAN 

user may interact with the system using either using DTMF embodiment, a LAN server IP or a WAN server IP may be 

signalling or voice command or both. Various DTMF or connected to an ISCP. Other server IPs may be connected to 

voice commands, or a combination of both, may be used to specific computer networks as needed, 

traverse across a document, file or several files. For example, 60 'ITie ISCP 320 is also connected, according to 1129+ 

a user may press an alphanumeric DTMF sequence to specifical tons, to a speech IP 340 for conversion of speech 

indicate that the user wants to use a hyper-text link to jump to text 360, text to speech 361, and DTMF detection 362. 

to another file. Alternatively, the user may speak a voice The speech IP is connected to the telephone 10 through SS7 

command to repeal a certain section of text or otherwise lines to central office 310. 

traverse up and down a file. Or the user could use DTMF 65 Preferably, file server memory 330 is connected to ISCP 

signalling to go back to a previously accessed file. Or the 320 to reduce traffic across the AIN, however, the file server 

user may press a DTMF number to request the name and may be connected at any point in the AIN. File server 
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memory 330 contains user profiles and location profiles (hat 
direct the creation of custom reports. For example, a certain 
user regularly checks the closing price of a certain stock and 
the traffic report for the area near the user's house in Silver 
Spring, Md. before leaving the office. Instead of traversing 
several system menus and submenus to access the desired 
- information, that user may have a profile that directs the 
initial prompt from the system to be, "Press 1 for the Bell 
Atlantic stock closing price and the traffic report for the 
Silver Spring area. Press 2 for other menu options." The AIN 
network could be aware of the availability of a user profile 
through caller ID or other AIN identification features. 

A location profile is similar to a user profile, but instead 
of depending on a user's personal identity, the location 
profile depends on a telephone's geographic location. For 
example, a user who is a traveling salesperson regularly uses 
the system to check traffic reports from a cellular telephone 
in the car. A location profile could be triggered by a location 
ID from the user's cellular telephone and produce an initial 
prompt that says, "Press 1 for the traffic report for your area. 
Press 2 for other menu options." For a non-cellular 
telephone, a caller ID could indicate location information or 
a location ID. The location ID would be passed to the AIN 
along with the user's DTMF or voice command signals, and 
the translated subject word or phrase could include the 
geographic location corresponding to the location ID. 

These caller and location IDs could be used to ensure 
secure access to sensitive networks or sensitive files. For 
example, a firewall software program may interact with a 
server IP so that only users with authorized caller IDs are 
allowed to access a particular network. Other security 
arrangements, such as password protection or voice 
recognition, can also be used by the AIN to restrict access to 
certain files or networks. Additionally, the AIN may interact 
with a computer network to ensure proper identification and 
encryption of financially sensitive information, such as 
credit card numbers or electronic bank account codes. 

FIG. 4 shows an advanced intelligent network implemen- 
tation that may be used to implement long distance tele- 
phone access across a network. A user at telephone 10 could 
request a long distance connection over a computer network 
15 and then input the telephone number of the desired 
telephone 40 using DTMF signalling or voice commands. 
Once ISCP 320 receives instructions from the user through 
central office 310, Server IP 350 establishes a connection to 
the specified telephone 40 across a network 15 through 
Server IP 450, ISCP 420, and central office 410. Audio 
information from user's telephone 10 is properly formatted 
and placed in packets by speech IP 340 for transmission 
across network 15. Server IP 450 receives the packets of 
audio information from network 15, and ISCP 420 in con- 
junction with speech IP 440 decodes the packets to establish 
a long distance telephone call to telephone 40. Audio infor- 
mation from telephone 40 is transmitted to telephone 10 in 
a similar manner after a connection is established. 

According to the invention, a network search engine may 
be provided to investigate documents located on a relatively 
unconstrained network such as the World Wide Web in order 
to locate documents which are highly compatible with audio 
presentation of even documents which are specifically 
labeled to be compatible. The document search may be 
automated along with the process of indexing such docu- 
ments. The index may be built and reside on the central 
system for ease of access by a user by invoking a local 
search command. 

'Ilie sc;irch engine may be a worm type searcher or oilier 
robotic type search engine. It may be one of the application 
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software packages 131 or may be performed by a separate 
computer. A parser, together with a call manager, can 
interrogate the documents that arc found and determine if 
the documents reach a threshold level of compatibility. 
Compatibility can be determined by lack of non-audio or 
audio-translatable components or other objective criteria. 
Compatible documents may be indexed. The index may be 
stored as one or more documents preferably in a hierarchical 
order. The user may use the navigation commands to 
traverse the index and invoke a link to a source document. 

A system according to the invention may be built on a 
TCP/IP platform. The system may be used not only for 
accessing Web pages on an Internet network which uses 
hyper-text transfer protocol (HTFP), but also for accessing 
E-mail on a Novell™ IPX/SPX or other network, accessing 
files via file transfer protocol (FTP) or Gopher, accessing 
files via asynchronous transfer protocol (ATM), or any other 
computer network that supports TCP/IP. The system may 
also be modified to encompass standard voice mail formats 
and other IVR systems. 

This system may, of course, be carried out in specific 
ways other than those set forth here without departing from 
the spirit and essential characteristics of the invention. 
Therefore, the presented embodiments should be considered 
in all respects as illustrative and not restrictive and all 
modifications falling within the meaning and equivalency 
range of the appended claims arc intended to be embraced 
therein. 

We claim: 

1. An interface system for presenting one or more com- 
puter documents in an audio format and navigating through 
said documents, comprising: 

an audio interface for receiving a user command; 

a call manager connected to the audio interface for 

controlling the routing of information to and from the 

audio interface; 
a translator connected to the call manager for translating 

the user command into a subject word or phrase; 
a browser connected to the call manager for retrieving a 

document identified by a link from a computer network 

related to the subject word or phrase; 
a parser connected to the call manager for parsing the 

document into file segments according to the standard 

format; 

an audio file player connected to the call manager for 
playing audio file segments contained in the document 
to the audio interface. 

2. An interface system according to claim 1 wherein the 
user command comprises a dual-tone multi-frequency sig- 
nal. 

3. An interface system according to claim 2 wherein the 
user command comprises a user voice commands. 

4. An interface system according to claim 1 wherein the 
user command comprises a user voice commands. 

5. An interface system according to claim 4 further 
comprising a voice recognition engine associated with said 
call manager. 

6. An interface system according to claim 4 further 
comprising a speech-to-text converter associated with said 
call manager. 

7. An interface system according to claim 1 further 
comprising: 

a searcher connected to the call manager for searching a 
computer network for file addresses of files related to 
the subject word or phrase. 

8. An interface system according to claim 1 further 
comprising: 
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a computer memory connected to the parser tor storing 25. A document navigation and audio presentment method 

predetermined file addresses. comprising the steps of: 

9. An interface system according to claim 1 further accessing a computer document; 

com prising: c . . 

^ * , interpreting content of the document; 

a transformer connected to the call manager for trans- 5 . 

forming a non-audio file segment into an audio file converting segments of .he document to aucho informa- 

■ segment i»°n based on the interpretation of the content of the 

10. An interface system according to claim 9 wherein the segments; 

transformer comprises a text-to-speech converter. navigating through said document responsive to user 

11. An interface system according to claim 10 wherein the 10 commands; and 

transformer comprises a file decompression unit. invoking actions dictated by content of said document in 

12. An interface system according to claim 9 wherein the response to a user command. 

transformer comprises an audio decoder. 2 6. A method according to claim 25 wherein the step of 

13. An interface system according to claim 1 wherein the convcrtin mpiises the st cp of playing audio files, 
translator comprises a dual-tone mulli-freqiiency detector. " 2? A mc[hod ^ ^ 2$ the of 

14. An interface system according to claim 13 wherein the . . " e , . 

. u. . . ^ converting comprises the step ot texl-to-speech conversion. 

translator comprises a speeen-to-text converter. ,„ , , ; ,. r 

15. An interface system according to claim 1 wherein the 28 ' . mcthod accordin S t0 cla T 27 **J m ! hc S,Cp ° f 
translator comprises a speech-to-text convener. converting comprises imposing different audio characteris- 

16. An interface system for presenting one or more 20 tics on different types of segments. 

computer documents in an audio format and for navigating 29 A melnod according to claim 25 wherein the step of 

through said documents comprising: invoking content based actions comprises the step of access- 

/a\ • , A n' nu . ™„* , ing linked documents when the content is a link to a second 

(A) an intelligent signal control point comprising: * 

(1) a call manager for controlling the routing of infor- computer document. 

mation to and from an audio interface; and 25 30 - A method according to claim 25 wherein the step of 

(2) a user command interpreter connected to the audio accessing comprises accessing locally stored documents, 
interface; 31. A method according to claim 25 wherein the step of 

(B) a server intelligent peripheral connected to the intel- accessing comprises accessing remotely stored documents, 
iigent signal control point comprising: w 32 ' A method according to claim 25 wherein the step of 

(1) a browser for retrieving documents identified by a accessing comprises accessing documents stored in a corn- 
link from one or more computer storage facilities; puter network. 

and 33. An audio interface system comprising: 

(2) a parser for parsing the document into segments a document access and retrieval unit associated with one 
according to the content of the document; and 35 or more computer document storage facilities; 

(3) a presentation manager for directing the presenta- a wi|h aid access and re1rieval miu 
tion of the segments; and which jdemifies , he forma| of se&ments of retrieved 

(C) a speech intelligent peripheral connected to the intel- documents based on the type of content contained in 
ligent signal control point comprising: ^ segments- 

(1) a speech-to-text converter; and aq ' , . . 

(2) a text-to-spcech converter. one or mo |; e a ^.o output devices responsive to the 
■ 17. An interface system according to claim 16 wherein the P arser - ™ her ™ Mld audl ° output devices convert seg- 

server intelligent peripheral further comprises a searcher for ments of said document to audio intormation in accor- 

searching the computer storage facilities for documents dance WIth ,hc format of ^ segments; 

according to a predetermined criteria and indexing docu- 45 a l,nkcr capable of retrieving documents identified by a 

ments that satisfy said criteria along with an address of said link when the content contained in said segments is a 

documents. document link. 

18. An interface system according to claim 16 further 34. An audio interface system according to claim 33 
comprising computer memory connected to the intelligent whcrc tne auclio information is a signal suitable to be played 
signal control point. 50 to a user throu g h a telephone. 

19. An interface system according to claim 18 wherein the 35 - An audio interface system according to claim 33 
memory contains a caller identification for identifying a user further comprising: 

of the audio interface. a command response unit, responsive to a user command, 

20. An interface system according to claim 18 wherein the for controlling document presentation. 

memory contains a location identification for identifying the 55 36. An audio interface system according to claim 35 

location of the audio interface. wherein said access and retrieval unit is connected to said 

21. An interface system according to claim 16 wherein the command response unit. 

user command comprises a dual-tone mulli-lrequency sig- 37. An audio interface system according to claim 35 

nal. wherein said command response unit is responsive to con- 

22. An interface system according to claim 16 wherein the tenl of said document subject to user commands. 

user command comprises a user voice command. 38. An audio interface system according to claim 37 

23. An interface system according to claim 22 further wherein, responsive to a user command, said command 
comprising a voice recognition engine associated with said response unit will instruct the access and retrieval unit to 
call manager. retrieve a second document based on the content of a first 

24. An interface system according to claim 22 further 65 document, 
comprising a speech-to-text converter associated with said 

call manager. * » * * * 
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